Evaluation of Sibel’s Advanced Neonatal Epidermal (ANNE) wireless continuous physiological monitor in Nairobi, Kenya

Background Neonatal multiparameter continuous physiological monitoring (MCPM) technologies assist with early detection of preventable and treatable causes of neonatal mortality. Evaluating accuracy of novel MCPM technologies is critical for their appropriate use and adoption. Methods We prospectively compared the accuracy of Sibel’s Advanced Neonatal Epidermal (ANNE) technology with Masimo’s Rad-97 pulse CO-oximeter with capnography and Spengler’s Tempo Easy reference technologies during four evaluation rounds. We compared accuracy of heart rate (HR), respiratory rate (RR), oxygen saturation (SpO2), and skin temperature using Bland-Altman plots and root-mean-square deviation analyses (RMSD). Sibel’s ANNE algorithms were optimized between each round. We created Clarke error grids with zones of 20% to aid with clinical interpretation of HR and RR results. Results Between November 2019 and August 2020 we collected 320 hours of data from 84 neonates. In the final round, Sibel’s ANNE technology demonstrated a normalized bias of 0% for HR and 3.1% for RR, and a non-normalized bias of -0.3% for SpO2 and 0.2°C for temperature. The normalized spread between 95% upper and lower limits-of-agreement (LOA) was 4.7% for HR and 29.3% for RR. RMSD for SpO2 was 1.9% and 1.5°C for temperature. Agreement between Sibel’s ANNE technology and the reference technologies met the a priori-defined thresholds for 95% spread of LOA and RMSD. Clarke error grids showed that all HR and RR observations were within a 20% difference. Conclusion Our findings suggest acceptable agreement between Sibel’s ANNE and reference technologies. Clinical effectiveness, feasibility, usability, acceptability, and cost-effectiveness investigations are necessary for large-scale implementation.

Introduction Globally, neonatal mortality remains high with over 2.4 million deaths in 2019, the majority in resource-constrained settings [1]. In Sub-Saharan Africa, most neonatal mortality stems from largely preventable and treatable causes of death, including preterm birth, asphyxia, and infectious diseases [2]. Early detection and treatment of these life-threatening conditions using multiparameter continuous physiological monitoring (MCPM) technologies are critical to improving quality of care and averting deaths [3][4][5][6]. Currently, MCPM technologies are not commonly available at labor and delivery sites in resource-constrained settings, in part due to the high cost of equipment and lack of trained personnel [7]. The Evaluation of Technologies for Neonates in Africa (ETNA) is an African-based technology-testing platform established to optimize neonatal technologies and improve neonatal health outcomes in resource-constrained settings. ETNA endeavours to understand real-world clinical feasibility, performance, and accuracy of novel technologies. The current study analyzes the clinical accuracy of an investigational MCPM technology compared to verified reference technologies [8].

Study design and procedures
We conducted an iterative prospective study to assess agreement of heart rate (HR), respiratory rate (RR), peripheral oxygen saturation (SpO 2 ) and chest skin surface temperature measurements from Sibel's Advanced Neonatal Epidermal (ANNE) (Sibel Inc., IL, USA), investigational technology with those measurements from reference technologies. We conducted the study at Aga Khan University, Nairobi (AKU-N), a tertiary healthcare facility in Kenya. Sibel's ANNE vital signs monitoring platform includes two neonatal-sized, non-invasive, adhesive skin sensors attached directly to the skin surface that are capable of continuously measuring and recording HR, RR, SpO 2 , and skin surface temperature (S1 Fig). Up to 30 hours of data is stored locally within the sensor and wirelessly transmitted to a central database supported by customized software. We compared Sibel's ANNE HR, RR, and SpO 2 measurements with those from Masimo's Rad-97 pulse CO-oximeter with capnography (Masimo Corporation, USA) technology as reference. RR from the reference technology was measured by capnography using an infant/pediatric nasal cannula to collect the neonate's exhaled carbon dioxide (CO 2 ) levels. We compared Sibel's ANNE temperature measurements with those measured using Spengler's Tempo Easy non-contact infrared thermometer (SPENGLER HOLTEX Group, Aix-en-Provence, France) as reference.
In order to identify agreement thresholds for comparison with Sibel's ANNE technology, we assed functionality and estimated within-and between-neonate variability while verifying Masimo's Rad-97 technology [8]. We ran an initial round of open-label data collection from both Sibel's ANNE technology and the reference technologies to test the accuracy testing methods. In the open-label round, reference data for HR, RR, and SpO 2 was shared with Sibel before analysis. These data included 1 hertz (Hz) trends data (including HR, RR, and SpO 2 values), the raw plethysmograph waveform, signal quality data, and the capnography CO 2 waveform from Masimo's Rad-97 technology. We then conducted three rounds of closed-label testing and analyses. After each subsequent round of data analysis, Sibel was provided with all reference technology datasets in order to provide Sibel an opportunity to improve their detection and measurement algorithms. The study's primary outcome was agreement between the HR, RR, SpO 2 , and temperature measurements for Sibel's ANNE technology and the reference technologies. We hypothesized that Sibel's ANNE technology would show good agreement within a priori-defined thresholds for each vital sign measurement and minimal bias when compared to the reference technologies.
Trained study clinicians recruited, obtained informed consent, and enrolled eligible neonates from the neonatal intensive care unit (NICU), neonatal high dependency unit (NHDU), and postnatal and maternity wards at AKU-N (Table 1). Neonates were simultaneously

Data processing and selection
HR, RR, and SpO 2 data were collected from Masimo's Rad-97 technology in real-time with a custom Android (Google LLC, Mountain View, USA) application. Temperature data were entered manually into a REDCap data collection application [9]. HR, RR, and SpO 2 data were parsed in C (Dennis Ritchie & Bell Labs, USA) to obtain plethysmograph waveform and plethysmograph quality index (PO-SQI) data at 62.5 Hz and capnography waveform data at approximately 20 Hz. Instantaneous HR was obtained from the timing of the PO-SQI, which was calculated by Masimo's Rad-97 technology for each heartbeat. We completed analysis of CO 2 waveform data using a breath detection algorithm developed in MATLAB (Math Works, USA) based on adaptive pulse segmentation which has been validated internally and on the CapnoBase database [10] and is accurate to within ±5% for a neonate breathing at 60 breaths/ minute [11]. The breath detection timing allowed for a breath duration calculation. An algorithm calculated the RR median for each epoch (Table 1). Furthermore, the custom MATLAB algorithm also provided a capnography quality index (CO 2 -SQI) based on capnography features. Values for SpO 2 were provided by Masimo's Rad-97 at 1 Hz. We performed manual RR counting from capnography in the reference technology. Two trained observers independently reviewed plotted capnogram waveforms and counted all breaths within each epoch based on standardized rules. The independent counts were averaged; if the number of breaths counted varied by more than three breaths, a third trained observer also counted the breaths, and the two closest results were averaged.
Measurements of HR, RR, and SpO 2 from Sibel's ANNE technology were sampled at between 128 and 512 Hz from the output signal and down-sampled to provide values at 1 Hz. Temperature measurements were conducted once every 10 minutes. To evaluate agreement, we included 60-second HR, RR, and SpO 2 epochs, with sufficient signal quality, which were randomly selected (S1 Table). All temperature measurements were included in the analysis.
To calculate sample size for each closed-label round, we estimated that 20 neonates with ten replications each would provide a 95% upper and lower LOA between two methods of +/-0.76 times the standard deviation (SD) of their differences. Tight confidence intervals (CI) require sample sizes of roughly 100 to 200 samples which is generally sufficient for method comparison studies [12].
Mean HR, RR, and SpO 2 values for the selected epochs were calculated (

Statistical analysis
To determine the normalized agreement between Sibel's ANNE and reference technologies, we calculated the normalized bias (95% CI) and spread between the 95% limits of agreement (LOA) by dividing the bias and spread between the 95% LOA by the overall mean reference value [13]. Based on Masimo's Rad-97 reference technology verification phase, the acceptable a priori-defined spread between the 95% upper and lower LOA of 30%, approximately equivalent to a root-mean-square deviation (RMSD) of 8, was selected for both RR and HR [8]. RMSD was calculated for each vital sign. We selected RMSD thresholds of � 3.5% for SpO2 and � 1.5˚C for temperature, with a spread between the 95% upper and lower LOA of � 4.5˚C, based on a review of the literature and internal reference technology testing completed during the verification phase of the study [8]. Clarke error grids were constructed with zones of 20% discrepancy to improve clinical interpretability of RR and HR results.
All analyses were conducted using R (version 3.6.2) with the following packages: readr (version 1.3.1), data.
In the open-label analysis round, 140 epochs were selected from nine neonates for RR, 153 epochs from 10 neonates for HR, 84 epochs from seven neonates for SpO 2 , and 28 measurements from 10 neonates for temperature. A total of 81.5% of the data from Sibel's ANNE technology was considered sufficient quality in the open-label round, compared with 75.7% of the data from the reference technology (S1 Table). During each closed-label round, 10 epochs were selected from a minimum of 20 neonates for HR, RR, SpO 2 , and temperature, resulting in 200 measurement pairs per vital sign per round being included. More data from Sibel's ANNE technology were accepted as being sufficient quality in each of the closed-label rounds, compared with the data from the reference technology (round 1: ANNE = 78.4% vs 63.3%; round 2: ANNE = 56.5% vs 50.1%; round 3: ANNE = 84.0% vs 76.1%). No overlapping epochs were in any of the analysis rounds.
Analysis of the HR data showed a small positive normalized average bias (range 0 to 2.2%) with a normalized spread of LOA meeting or surpassing the a priori-defined threshold in each round ( Table 2; Fig 1). We observed a decrease in the normalized spread between 95% LOA (16.2 to 4.7%) and RMSD (4.3 to 1.2%) between closed-label rounds two and three. All Sibel's ANNE HR measurements were within 20% of Masimo's Rad-97 values (Fig 2A, region A,  Clarke error grid).
RR analyses showed a large variation in average bias across rounds for Sibel's ANNE technology compared to Masimo's Rad-97 for both manually counted RR (range -12.9 to 1.5 breaths/minute) and algorithm-derived RR median (range -13.0 to -0.7 breaths/minute) values (Table 2; Fig 3). The normalized spread between 95% LOA decreased between the second and third closed-label rounds for manual RR count (110.1 to 29.3%) and median RR (110.3 to 20.6%), thereby meeting the a priori-defined threshold for both methods of calculating RR. Absolute and normalized spreads of 95% LOA for median RR values were smaller than manually counted RR values in all rounds. All Sibel's ANNE RR measurements were within 20% of Masimo's Rad-97 values (Fig 2B, region A, Clarke error grid).
SpO 2 analysis showed minimal change in bias (range -0.3 to 2.7%) for Sibel's ANNE technology compared to Masimo's Rad-97, with the largest change occurring between the second and third closed-label rounds (2.7 to -0.3%; Table 2; Fig 4). The RMSD increased between the open-label and second closed-label rounds (2.6 to 4.4%), followed by a decrease to 1.9% between the second and third closed-label rounds, meeting the a priori-defined threshold.
Skin surface temperature analysis showed minimal bias and bias change (range -0.1 to 0.5˚C) between rounds for Sibel's ANNE technology compared to Spengler's Tempo Easy reference technology ( Table 2; Fig 5). The RMSD for temperature increased (0.5 to 1.5˚C) between each round but met the a priori-defined accuracy threshold in each round.

Discussion
The a priori-defined agreement thresholds for neonatal HR, RR, SpO 2 , and skin surface temperature measurements were met after completing three rounds of closed-label analyses comparing Sibel's ANNE technology and the reference technologies. Between the open and closed rounds, Sibel modified the HR-detection algorithm by adding edge case handlers in the ECG signal where significant motion artifact was detected. Between closed-label rounds two and three, Sibel's ANNE chest sensor software algorithms were augmented to interrogate bio-

PLOS ONE
Evaluating ANNE: Accuracy testing of a wireless continuous physiological monitor in Nairobi, Kenya Comparison of Sibel ANNE respiratory rate (RR) to Masimo Rad-97 RR manual count. Each dot represents a data pair, with the color intensity proportional to density of data pairs. Region A (in green) contains data pairs that are within 20% of the Masimo Rad-97 device value. Region B (in yellow) contains data pairs not within 20% that would not lead to unnecessary treatment. Regions C, D and E are in red. C includes data pairs leading to unnecessary treatment. D includes data pairs with a failure in detecting low or high HR/RR events and E includes data pairs where low and high HR/RR events are confused. https://doi.org/10.1371/journal.pone.0267026.g002

PLOS ONE
Evaluating ANNE: Accuracy testing of a wireless continuous physiological monitor in Nairobi, Kenya impedance measurements for improved RR calculations. A modified calibration factor was also implemented for Sibel's ANNE limb sensor at this stage. Following these modifications, the normalized spreads between 95% LOA for HR, RR, and SpO 2 decreased and there was a reduction in bias for all vital signs.
A normalized ±30% spread of 95% LOA for HR and RR was selected using real-world data obtained from neonates during Masimo's Rad-97 reference technology verification phase [8]. A similar LOA has been widely accepted in determining thresholds of agreement for a new method in cardiac output method comparison studies which has been used extensively in the field since it was proposed in 1999 [14]. For a neonate breathing at 60 breaths/minute with a within-neonate variation of 2 breaths/minute, a 30% spread of LOA would equate to 3.3% variation. The Clarke error grids suggest that it is unlikely that treatment decisions would have significantly changed based on the differences between simultaneous observations made by the two technologies.
Capnography has superior performance at higher RR, which is common in neonates, and was chosen as the reference standard for measuring RR [15]. Using a standardized protocol to carefully count breaths from capnograms allowed for manually counted values to be compared with the RR values provided by Sibel's ANNE technology. We found that the accuracy of RR comparisons was dependent on the correct placement of Sibel's ANNE sensors. The improved agreement seen in closed-label round three likely was due in part to a change in the chest sensor location from a horizontal placement across the central sternum. In closed-label round three, the chest sensor was placed at a 45-degree angle with one end on the xiphoid process and the other end on the abdomen. This change augmented the signal strength of the bioimpedance signal of RR in neonates, after which RR agreement improved sufficiently to meet the agreement threshold.
Optimizing Sibel's ANNE algorithm between closed-label rounds two and three also resulted in large improvements in SpO 2 accuracy compared to the reference technology. These changes were introduced upon recognizing that the enrolled neonates had darker skin tones than those previously evaluated with Sibel's ANNE technology. The SpO 2 accuracy improved after the photoplethysmography light emission was increased.
Surface thermometers do not reflect core body temperature due to their physical distance from the core [16]. The results from the skin surface temperature comparison showed agreement steadily decreasing between analysis rounds. The large spread in 95% LOA in closed- label round three might be due to three of the 84 (3.6%) temperature values being outside of the 95% upper and lower-LOA by more than 5 degrees. The outlier values may be due to noncompliance with measurement procedures rather than with the accuracy of the technology, but this cannot be verified.
A strength of this study is the non-Sibel investigators' independence in study design, data collection, and analyses. Further, we tested accuracy in the population where the technologies will be used. This led us to discover the impact of darker pigmentation. Our findings are supported by the raw and high-resolution photoplethysmography and capnography data, a manual counting of breaths by two independent reviewers and the randomized selection of comparison epochs. However, the AKU-N study site is relatively highly-resourced and Sibel's ANNE technology may have performed differently in lower-resourced settings. Our recently completed clinical feasibility evaluation of Sibel's ANNE technology at a publicly-funded highvolume maternity hospital is more typical of resource-constrained settings. Usability, acceptability, accuracy, and evaluation of agreement when identifying critical clinical events were also evaluated in this lower-resourced setting.

PLOS ONE
Evaluating ANNE: Accuracy testing of a wireless continuous physiological monitor in Nairobi, Kenya Sibel's ANNE technology is portable, lightweight, non-invasive and can be battery powered, wireless, and wearable during kangaroo mother care. Its only disposable component is hydrogel adhesive. Of note, data from critically ill neonates with higher or irregular HR, RR, SpO 2 , or temperature readings could affect Sibel's ANNE sensor performance and impact accuracy comparisons; future accuracy evaluation of Sibel's ANNE technology in neonates in intensive or critical care will be necessary.

Limitations
There are a number of limitations to the results reported in this study. Approximately onethird (36.8%) of neonate recordings were excluded from the analysis. This was in part due to some fragile neonates not tolerating Masimo's Rad-97 reference technology's nasal cannula. Exclusion due to nasal cannula usage was not a concern with Sibel's ANNE technology because RR is collected from the chest sensor. Electrical outages further affected data quality and duration, contributing to data loss. Furthermore, only epochs with the highest quality reference data were chosen for analysis in order to minimize uncertainty. Bias could have been introduced by the breath detection algorithm during the creation of the capnography quality index (CO 2 -SQI) which was essential since capnography signal quality was not provided by the reference device. No clinical correlations or outcomes were analyzed as many of the neonates in this study were healthy or relatively healthy.
The accuracy of Sibel's ANNE non-invasive MCPM technology is promising; however, additional research is required prior to large-scale implementation. This could include investigations in clinical care process improvements, clinical outcomes, clinical feasibility, usability, acceptability, cost-effectiveness and clinical effectiveness. The development of a neonatal MCPM suitable for use in resource-constrained settings that can accurately monitor HR, RR, SpO 2 , and skin surface temperature has promising implications for clinical practice.